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Abstract — We consider the allocation of spectral and power 
resources to the mobiles (i.e., user equipment (HE)) in a cell every 
subframe (1 ms) for the Long Term Evolution (LTE) orthogonal 
frequency division multiple access (OFDMA) cellular network. 
To enable scheduling based on packet delays, we design a novel 
mechanism for inferring the packet delays approximately from 
the buffer status reports (BSR) transmitted by the UEs; the 
BSR reports only contain queue length information. We then 
consider a constrained optimization problem with a concave 
objective function - schedulers such as those based on utility 
maximization, maximum weight scheduling, and recent results on 
iterative scheduling for small queue/delay follow as special cases. 
In particular, the construction of the non-differentiable objective 
function based on packet delays is novel. We model constraints 
on bandwidth, peak transmit power at the UE, and the transmit 
power spectral density (PSD) at the UE due to fractional power 
control. When frequency diversity doesn't exist or is not exploited 
at a fast time-scale, we use subgradient analysis to construct 
an 0(N log L) (per iteration with small number of iterations) 
algorithm to compute the optimal resource allocation for N users 
and L points of non-differentiability in the objective function. For 
a frequency diversity scheduler with M sub-bands, the corre- 
sponding complexity per iteration is essentially 0(N(M 2 + L 2 )). 
Unlike previous iterative policies based on delay/queue, in our 
approach the complexity of scheduling can be reduced when 
the coherence bandwidth is larger. Through detailed system 
simulations (based on NGMN and 3GPP evaluation methodology) 
which model H-ARQ, finite resource grants per sub-frame, 
deployment, realistic traffic, power limitations, interference, and 
channel fading, we demonstrate the effectiveness of our schemes 
for LTE. 

I. Introduction 

Wideband cellular systems such as LTE allow for resource 
allocation with high granularity of a resource block (RB) 
of 1 ms by 180 KHz 0. While control signalling and the 
general framework for the physical and medium access control 
(MAC) layers is specified to enable efficient use of spectral 
resources, the exact resource allocation algorithms for power 
and frequency allocation can be designed by an implementor. 
Moreover, each cell can serve on the order of a thousand active 
connections over a bandwidth of 20 MHz. Hence, in order to 
take advantage of the flexibility allowed in resource allocation, 
the resource allocation algorithms have to be computationally 
simple. Many schedulers in the literature entail maximizing 
the weighted sum of rates in each subframe. For example, 
the weights could be based on utility functions of average 
rate 0, 0, the queue length 0, 0, or head-of-line de- 
lay 0, 0. In the uplink, the resource allocation problem must 
consider the maximum transmission power of a mobile and the 
constraints on the transmission power imposed by fractional 
power control to limit inter-cell interference 0, 0. When 



contiguous bandwidth allocation is considered, the problem of 
maximizing the weighted sum rate in each subframe on the UL 
can be posed as a constrained convex optimization problem. 
For N users and M sub-bands general purpose methods 
can solve the problem in 0((NM) 3 ). With peak UE power 
constraints, a 0(NM) per iteration subgradient algorithm was 
obtained in [9|; heuristics to compute allocations with integral 
number of resource blocks (RBs) were considered as well. 
Interior point methods (which have faster convergence) with 
an 0(NM 2 ) (if N » M) Newton iteration were obtained 
in Hp | for uplink resource allocation with additional fractional 
power control constraints. However non-differentiable objec- 
tive functions are not considered under the framework in 1 10 1. 

Also relevant to our paper are recent results on low com- 
plexity iterative scheduling algorithms. Many papers prior to 
these results had considered scheduling to maximize the sum 
of weighted rates in subframe n, where the weights were based 
on the arrivals and departures in the queue of a user until 
subframe n — 1. The iterative policies in ifTTI . iflZl take into 
account how the weights change in subframe n to determine 
the resource allocation in that subframe. In particular, the 
queue based server side greedy (SSG) rule is proposed for 
multi-rate channels in lfl2l and a delay based rule with iterative 
matching in each subframe for ON-OFF channels is considered 
in IfTTI . The results in these papers shed a remarkable insight 
that when the rate grows linearly with bandwidth (no peak 
power constraints at the transmitter), as the number of users 
in the system grow, these rules lead to much smaller per- 
user queues and delays, respectively, compared with previous 
approaches. However, the complexity of these algorithms grow 
with the resource granularity even if the coherence bandwidth 
does not grow. In this paper, we construct a continuous but 
non-differentiable concave reward function based on packet 
delays. We argue that the matching algorithm in IfTTI is an 
approximate algorithm to maximize this reward function in 
every subframe. 

Motivated by the above observation, we consider re- 
source allocation to maximize a continuous (possibly non- 
differentiable) concave reward function. We first consider a 
channel model where the channel gain in the frequency domain 
is flat and formulate the resource allocation problem as a 
non-differentiable convex optimization problem. Note that in 
typical cellular environments, the channel gains can be fairly 
correlated even for frequencies 2 to 5 MHz apart |[T3l - 
hence, the assumption of frequency flat fading is a reasonable 
one when the total bandwidth is up to 5 MHz (28 RBs) 
or lower, or if the UEs are allocated to sub-bands (< 5 
MHz) over a slower time-scale based on interference and 



2 



channel statistics. The above assumption allows us to use 
subgradient analysis to design algorithms with O(NlogL) 
cost per iteration (with small number of iterations) for N users 
and L points of non-differentiability in the objective function. 
We discuss implementation issues for the resulting algorithm 
in a practical LTE system with H-ARQ re-transmissions, finite 
number of resource grants per subframe, and the constraint that 
all uplink transmissions have to be over a contiguous set of 
RBs. Notably, we also design a novel mechanism to estimate 
head-of-line delays of queues at UEs with low complexity via 
only queue length information contained in the buffer status 
reports (BSR). We note our techniques are equally applicable 
for enabling delay based scheduling in the PCF and HCF 
modes in WiFi lfl4l . We demonstrate the improvement in 
performance due to our techniques through numerical results 
obtained via comprehensive numerical simulations based on 
3GPP evaluation methodology [15). Finally, when frequency 
selective fading is considered, we show how interior point 
methods with complexity of 0(NM 2 + NL 2 ) per Newton 
iteration can be obtained; note that in practice N >> L,M. 
Since, we consider non-differentiable cost functions, this re- 
quires additional analysis compared to that in iflOl where only 
differentiable cost/utility functions were considered. 

Prior work in devising practical resource allocation schemes 
for the LTE uplink includes: [16] considers allocation of fixed 
size resource chunks to UEs, [17] extends this approach where 
each (RB,UE)-tuple is associated with a metric (which cannot 
capture power constraint at power limited UE), a similar 
(RB,UE) metric is considered in [18|. These methods do not 
extend to solving a general resource allocation problem con- 
sidered in this paper. Semi-persistent scheduling for voice over 
IP (VoIP) has been considered in, for example, [19|. In [20| 
heuristics for maximizing utilities of UEs in each subframe 
in the presence of frequency selective fading but no fractional 
power control were considered. Similarly, heuristics to satisfy 
minimum rate constraints of most users and maximize the 
sum rate were considered in [21], heuristics to maximize sum 
weighted rate were designed in 11221 . and algorithms for long 
term proportional fairness were considered in ||231 . 

II. System Model 
A. Channel Model, Power, Rate 

We focus on the uplink of a single cell in LTE with N 
UEs and the total bandwidth divided into M sub-bands of 
equal bandwidth B, with B less than the coherence bandwidth 
of each user. The maximum transmit power of each UE is 
P. The channel gain for UE i on sub-band j is Gij\ we 
focus on the scheduler computation in a subframe, and don't 
explicitly show the dependence of quantities on time t. The 
base-station can measure the G^s via decoding the sounding 
reference signal (SRS) [1|. Fractional power control in LTE 
limits the amount of interference a UE causes at base-stations 
in neighboring cells. A UE which is closer to the cell edge 
inverts a smaller fraction of the path loss to the serving 
base-station than a UE which is closer to the serving base- 
station 00. Thus the transmit powers of a UE on different 



sub-bands satisfy IflOl : 

M 

Ptj < m hj , Vi, j, ^2 Pv ^ p -> 

3=1 

where is the bandwidth allocated to UE i on sub-band j 
and jij is a sub-band specific constant. 

The interference PSD at the serving base-station on sub- 
band j (denoted as Ij) can be measured by the base-station 
periodically over unassigned frequency resources. The value 
depends on the interference coordination algorithm used [ 24 1 . 
When a UE transmits with power pij over bandwidth by on 
sub-band j, it achieves a rate given by (treating interference 
as noise) , „ N 

W { Mi J 

where ip : K + i-> K + is an increasing concave and differen- 
tiable function which maps the SINR to spectral efficiency. 

B. Control Signaling 

Single carrier frequency division multiple access (SC- 
FDMA) is used in the LTE uplink HI and so a UE can be 
granted a number of 180 kHz resource blocks in a contiguous 
manner in frequency. The resource allocation to the UEs is 
computed by the base-station every subframe (1 ms) and 
signalled to the UEs via resource grants which include the 
contiguous set of RBs allocated to the UE and the modulation 
and coding scheme (MCS). The timeline is as follows: a 
resource grant is made to the UE at time t for an uplink 
transmission at time (t + 4). At time (t + 8) the base-station 
transmits an ACK/NACK to indicate if it could decode the 
packet; if a NACK is received by the UE, it re-transmits at 
the same power and uses the same RBs at time (t + 12) (as at 
time (t + 4)). We assume a constant number of maximum 
allowable re-transmissions for all UEs and do not adapt 
the re-transmission power and resource assignment through 
additional control signalling available in LTE. 

Buffer status report (BSR) and scheduling request (SR) are 
transmitted by the UEs to inform the base-station about new 
packet arrivals at the UE. We describe the mechanism for the 
special case of single logical channel (LC), or connection, 
at each UE. SR is one bit of information used to indicate 
the arrival of packets in an empty buffer at the UE. Each 
UE periodically gets an opportunity to send SR, and the 
time interval between two successive opportunities for SR is 
denoted by T SR , and is assumed to be fixed in a cell. BSRs 
contain a quantized value of the number of bytes pending 
transmission at the UB3, and are generated in two different 
ways: Regular BSR: If the queue is empty in subframe t, 
and new packets arrive in subframe t + 1, a regular BSR is 
generated at time t + 1. When a regular BSR is generated, a 
SR is transmitted at the next available SR opportunity unless 
resources are granted to the UE between the BSR generation 
and the opportunity to transmit SR. Periodic BSR: A periodic 
BSR is generated every T BSR subframes. A periodic BSR thus 
generated is transmitted by the UE to the base-station at the 
earliest subframe after generation when resources are granted 
to it by the base-station. 

'We ignore the effect of quantization in BSR, but the methods in this paper 
extend easily to quantized BSR. 
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Fig. 1. Uplink timeline 



A typical sequence of transmissions is shown in Fig. 03 
The buffer at a UE is empty in subframe (t — 1) and a 1000 
byte packet arrives in subframe t. The next SR opportunity 
is subframe (i + 2) - the SR transmission by the UE signals 
to the base-station that the buffer at the UE is non-empty. 
In response, the base-station allocates resource to the UE on 
the uplink via a grant at time (t + 5) - the actual uplink 
transmission occurs 4 subframes later, i.e., in subframe (t + 9). 
This transmission includes the BSR report. Assume that the 
UE is allocated enough resources to also transmit 200 bytes 
of the data packet - then the BSR report will contain a 
value of 800 bytes for the left-over data at the UE. The first 
transmission is unsuccessful - this is indicated by a NACK 
transmitted by the base-station at time (t + 13). The UE 
re-transmits the packet at time (t + 17) - this transmission 
is decoded successfully by the base-station, and hence it 
is known at the base-station that 800 bytes were pending 
transmission at the UE at time (t + 9) which is the time when 
the BSR report was created. 

III. Reward Functions 

In this section, we define the reward functions that we use 
for the optimization problem and relate it to the schemes 
used in earlier works. We assume each UE to have one active 
LC which supports either best effort or delay QoS traffic. 



A. Best Effort 

A flow, i, which is best-effort is associated with an average 
rate Xi(t) £ R+ in subframe t which is updated as follows: 



Xi(t + 1) = (1 - ai)xi(t) + atn, Vt > 0, 



(1) 



where r.- L is the rate at which UE i is served in the current 
subframe, and < a; < 1 is a user specific constant. 
The user experience in subframe t is modeled as a strictly 
concave increasing function Ui : R+ M> K of the average rate 
Xi(t). Traffic for applications such as file transfer and web 
browsing can be modeled by best effort flows, and is typically 
transferred over a TCP connection which has closed loop rate 
control. We greedily maximize the total utility at each time- 
step, i.e., the reward function for UE i with best effort traffic, 
at time t is ||25ll 



Mn) = —Ui{{l - ai) Xi {t) + a.n). (2) 

If we set fi(ri) ~ U'(xi(t))ri, and let at — > in equation (Q}, 
the resulting scheduler is identical to that in [3|. Thus, our 



analysis offers a computationally efficient method to imple- 
ment the scheduling policy in J5] for the LTE uplink with 
fractional power control; also note that it is easy to show 
that the rate vectors in the uplink resource allocation problem 
satisfy the conditions required for the results in Q. 

B. Delay QoS Traffic 

Here the user experience is a function of the packet delays. 
User experience is acceptable when the packet delays are lower 
than a certain tolerable value. The packet arrival process is 
assumed to be independent of the times at which the packets 
are served. Traffic for applications such as voice calls and live 
video chatting fall in this category. 

At time t, let TTi (t) be the number of packets in the queue 
of UE i. Denote the sizes and the delays of these 7r, (t) packets 
by { Si (l), . . . , Si(Tn(t))} and {^(1), . . . , d^t))}. Then far 
a UE i with delay QoS traffic, we define the reward function 
as: 

fi(n) = Si(j)di(j) 

n^ erv (ri) 

r,A- J2 s *0') I + 



(3) 



where A is the length of a subframe (1 ms) and nf"(ri) is the 
number of packets from UE i served fully if UE i is scheduled 
at rate i.e., 

nf rv (r i ) = max jfc : Y^=i s i(j) ^ n^j- 

Lemma 3.1: ,fi(ri) is a continuous concave function. 
Proof: Concavity follows from the observation that 
di(l) > ... > di(wi(t)) and continuity is immediate from 
definition. ■ 
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Fig. 2. Example reward function for delay QoS flow. 

Example: Consider a UE with delay QoS traffic and four 
packets in the queue with delays (in ms) at time t given by 
di = 120,(^2 = 76, d% — 27, di — 3, and packet sizes (in 
KB) are si = 1.5, S2 = 0.7, S3 = 2.1, S4 = 3. Then the 
corresponding reward function ft is shown in Fig. |2] 

C. Iterative Queue and Delay Based Policies 

If we restrict the model in IfTTI to frequency flat fading, 
i.e., a user is either connected to no server or all servers at 
any time, the algorithm in that paper can be interpreted as 
one which approximately maximizes the reward function in 
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equation ([3]). Specifically, the matching algorithm reduces to 
one where in each iteration a server is allocated to a user with 
the highest head-of-line delay times spectral efficiency - this 
approximately equalizes the head-of-line delay times spectral 
efficiency for all users after the allocation, which (as we will 
show) is the optimality condition to maximize the reward 
function in (O when divisible servers are considered. Larger 
the number of servers the same bandwidth B is divided into, 
the closer the approximation. Note that peak power constraints 
are not modeled in ifTTl . When frequency selective fading is 
considered, i.e., a user may be connected to a subset of servers 
in the model in [11], there is a sequence of maximum weight 
matchings which will approximately compute a solution which 
maximizes the reward function in (O. Motivated by this 
interpretation, we consider the maximization of the reward 
function in (01 for a much more general model with multiple 
rate options, peak power constraints, and different transmit 
PSD constraints on different sub-bands. We also note that 
the complexity of the algorithm in IfTTl is 0{NR 2 ) for N 
users and R RBs - when there are multiple RBs in each sub- 
band of bandwidth B, the complexity of our algorithms is 
lower. Finally, similar connections can be drawn between the 
scheme in 1 12] for frequency flat fading and using an objective 
function based on sums of squares of queues as in |26|; the 
connections for the frequency selective fading case seem to 
exist but are harder to analyze. 

IV. Estimation of Packet Delays 

We now describe a method to infer approximate packet 
delays at the eNB via the mechanisms available in LTE. We 
use the SR and BSR report generated at the start of the burst 
of packets, and periodic BSR reports which are generated 
regularly but transmitted only when resources are allocated 
to the UE (see Sec. III-Bb . along with the scheduling decisions 
made by the base-station to estimate packet delays. The main 
intuition is as follows: if the base-station estimates the queue 
length at time t to be say, 1000 bytes, but later decodes a BSR 
which was created at time t and has value 1300 bytes, the 
base-station can deduce that 300 bytes arrived between time 
t and the time at which the previous BSR was created. This 
information about the time interval during which the 300 bytes 
arrived can be used for making resource allocation decisions - 
specifically, scheduling policies based on packet delays can be 
implemented. The main complexity is due to re-transmissions 
which can lead to the BSR report arriving out of order at the 
base-station. 

Let T retx be the maximum amount of time between the first 
transmission of a MAC packet and the latest time when it can 
be re-transmitted for H-ARQ (for example, if we configure 
6 as the maximum number of re-transmissions, T letx = 48 
subframes). We estimate the number of bytes that arrived, 
Ai (t) in each subframe t. The buffer status reports are denoted 
by a sequence of random three tuples: 

{B,(1),t,(1), 6,(1)}, {5,(2), 7i(2), 6i(2)}, . . . 

where Bi(l) is the buffer size reported in first BSR, Tj(l) is 
the time at which first BSR was received, and (rj(l) — <5» (1)) 



is the time at which the first BSR was generated, and so on. 
Ci (t) denotes the number of bytes scheduled for transmission 
from UE i, Ci (t) the number of bytes which were successfully 
received from UE i, and Fi (t) the number of bytes that failed 
the final re-transmission for UE i, at time t. 

We maintain the history of estimated queue length for each 
UE i for duration T retx , denoted by Q,(t - T letx : t). Then, 
we update the Q matrix and the arrival vector A, at each t as 
follows: 

For every t, i 

1) Scheduled Bytes: Q,(t) = Q 4 (t - 1) - d(t). 

2) Failed Bytes: Q,(t) = Q,(t) + F,(t). 

3) BSR report: If a BSR report is received at time t, i.e., 
there is n such that Tj(n) = t, then update queue state 
as follows: If the base-station has not received any BSR 
report created after time t — Si(n), then 

Qi(t - 5i(n) : t) = Q,(t ~ 6,(n) : t) + A,(t - 6,(n)) 

where arrival Ai(t — 6i(n)) = Bi(t) — Qi(t — Si(n)) 
otherwise for 

argmin [n(m) - 6i(m) ~ (n(n) - 6i(n))] 

{m: Ti(m)<t} 

update 

Ai(t - 6i(n)) = B t (t) - Qi(t - Si(n)) 

A l (n(m) - 5,(771)) = Ai(t - 6i(m)) - A,(t - Si(n)) 

Qi(t - 5i(n) : 7i(m) - <S<(m) - 1) 
= Qi(t- 6,(77) : Ti(m) - 6,(777) -1) + Ai(t- 6,(n)) 

Note that Qi can have negative entries. 

V. Frequency Flat Fading 

Here, we consider the resource allocation to N UEs over a 
single sub-band with bandwidth B and frequency flat fading. 
We drop the dependence of quantities in the general model on 
the sub-band j - for example, we denote channel gain from 
UE i to the eNB as Gi. We allow for contiguous allocation 
- this is a reasonable approximation when B is larger than 
a few RBs. Rounding techniques in, for example, (9) can be 
used to obtain integral solutions. The optimization problem to 
maximize the sum of rewards for all UEs over the bandwidth 
allocation vector b € R+ in a subframe is: 



N 



max. ^2 fi ( 



G i min(7 i 6 i ,P) 



lb, 

s.t. < b, < 6™ x , Vi, Yl bi - B 



i=l 



where 6™ x is the maximum bandwidth that UE i can use based 
on the estimated queue length, Qi(t), for UE i, and satisfies: 

'G i min( 7j &™ ax ,P) N 



= Qi(t)/A 



where we recall that A is the length of a subframe (1 ms). 
Since, the function on the left is an increasing function of 
&™ ax , we can compute &™ x efficiently via a bisection search. 
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Problem (@) is a convex optimization problem (with non- 
differentiable objective function) due to the lemma which 
follows. 

Lemma 5.1: The objective function in optimization prob- 
lem (0]l is concave in the bi& for bi > 0, for all i. 

Proof: Consider the function g : R + n- R + defined by 
g(x) — xip(c/x), Vx > 0,c G K + is constant. Since, ip is 
assumed to be concave, it is easy to verify (via showing that 
the second derivative is always negative) that g is concave as 
well. Since, (i) the sum of concave functions is concave, and 
(ii) the composition of one concave function with another is 
concave, to show that the objective function is concave, it is 
sufficient to show that the following function is concave 



Proof: The lemma follows from standard arguments in, 
for example J27), the definitions of fa's, and that the subdif- 
ferential of fi for delay QoS user i is given by 

i), ES? n ^X<A 



^(nf rv + l), 
[(kinf^diinf 



h[x) = xijj 



min(cix, c 2 ) 



We now evaluate the sub-differential of hi for x > 0, which 
is bounded because 7, is assumed to be bounded. 

' {f(2^)},ifx<P/ 7i 

*.«-) {*( fl r)- a s H v( fl r)}.«*>^ 



Note that the above function is well defined for x > 0. 
Since, ijj is an increasing function, we can write h(x) = 
min {xi(i(ci),xip(c 2 /x)} , which is the minimum of two con- 
cave functions, and hence, concave. ■ 



A. Characterization of Optimal Solution 

We define a function which maps the bandwidth allocation 
6, to achievable rate for user i: 



hi(bi) = biip 



d min(7 t 6 t ,P) 
lb, 



We denote the sub-differential of a function g : R M> R at x by 
dg(x). For continuous concave functions over the set of reals, 
the subdifferential at x is the set of slopes of lines tangent to 
/ at x. 

Let b* G R+ denote the solution to the resource allocation 
problem (0). The following lemma shows that an optimal 
allocation in a given subframe is one for which the following 
quantities are equal for all users with non-zero bandwidth 
allocation: for best effort user, the marginal utility times the 
incremental rate when more bandwidth is allocated to it, and 
for delay QoS user, the delay of the oldest packet which is 
not served completely times the incremental rate when more 
bandwidth is allocated to it. 

Lemma 5.2: There exists a A* > such that if i is best 
effort, then 

A* G U'((l - a)x t (t) + a^dh^b*), if b* > 
A* < U'((l - a)xi(t)) min<9/ii(0), if b* = 

else, if i is delay QoS and b* > 0, 

• if ££ W) Si(j) < ft A, X* G ^« rv «) + l)dhi(bt) 
. else if = < A 

A* g [di(nf") min Shift), di(nf* + l)maxdhi(tf)} 

else, if i is delay QoS and b* = 0, 

A* < min <%;(()) 

where r* — hi(b*). 



Va; > 0, ci, C2 G R+ are constant 
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Fig. 3. Optimality condition 

We illustrate the optimality condition via a two user exam- 
ple. The total bandwidth to be shared is 10 RBs, or 1800 KHz. 
All packets are of size 500 bits. The packet delays of the two 
users in the given subframe are 

User 1: [450,330,135,80,20] 
User 2: [170,150,140,110,80,20] 

The rate at which the users can be served as a function of the 
RBs are given by: 

h log 2 (1 + 10 05 ) 



6ilog 2 (l in 1 

b 2 log 2 (: 
b 2 log 2 



0.05 5*180 
61 

io - 4 ) 

2Q0. 48*180 
b2 



61 < 5 * 180khz 
61 > 5 * 180khz 

b 2 <8* 180khz 
b 2 >8* 180khz 



where the 5 and 8 RB thresholds (and corresponding SINRs 
of 0.5 dB and 4 dB) are derived from fractional power control 
constraints in Section IH-AI The subgradient of the rewards 
for both the users as a function of bandwidth allocation, and 
the optimal bandwidth allocation are shown in Fig [3] - the 
optimal resource allocation is 5 RBs to each user, and the 
optimal dual variable A* is shown in the figure. For each user, 
the figure also shows the number of RBs required to fully 
serve a given number of packets and the number of RBs at 
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which the user becomes power limited, i.e., the maximum peak 
power constraint limits the transmission power rather than the 
fractional power control which limits the transmit PSD. 



A < A* < A where 



A = max 

i=l,...,N 



1> 



max<9/i(0) 



B. Computation of Optimal Solution 

The optimization problem (0]i entails the maximization of 
the sum of concave functions subject to a linear inequality 
constraint. While, in principle, the optimal resource allocation 
scheme can be computed via a bisection search on the dual 
variable A, two difficulties arise: (i) There may be multiple 
values of bi for which the subgradient of /, o hi is equal to 
A. See, for example, the first packet for user 1 in Fig. [3] As a 
result the dual function is non-differentiable and the bisection 
search may not converge 11281 . (ii) If A belongs to the sub- 
differential at a point bi of non-differentiability of either 
or hi, the values of the gradient of /$ o hi may be arbitrarily 
different at (pi + e) and (pi — e) for an arbitrarily small e. This 
can also be seen in Fig [3] We use Algorithm 1 to compute the 
optimal solution of problem (0J. The convergence analysis is 
almost identical to that in Sec. 6 in [28 1. An accurate solution 
can typically be computed in about 10 iterations. 

Algorithm 1: Bisection search for optimal A 

Given starting value of A, A, b, b and tolerance e. 
repeat 

Bisect: A = (A + A)/2. 
Allocate bandwidth for all i: 
if A > max<9/,(0) maxi%j(0) then 
| set bi = 0. 
else 

bi is such that 



A e[min dfiin) x mmdh t (bi), 
max<9/i(ri) x max9/ii(&i)] 



(5) 



where 



biip 



Gj^mm^b^P) 
Ibi 



end 

Update: if >~2f =1 h — B > 0, X = X, b = b, else 
A = A, b = b. 



until |A - A| < e 

Feasible Solution: if J2i b± — J2ibi>0 then 

sel " >: l >: /■ ■ 

else 

I set a = 0. 
end 

b = ab+ (1 - a)b 



The starting values of A and A can be generated using 
the following simple lemma (proof is straightforward and 
omitted); the values of b and b are obtained by repeating the 
Allocate Bandwidth step in Algorithm 1 for dual variables A 
and A, respectively. 

Lemma 5.3: The optimal dual variable A* satisfies 



A = 



Gi(t)P\ Gi(t)P tl (Gi(t)P 



IB 



x max9/, I Btp 



B 

Gj(t)P 
IB 



IB 



for some i 



The main computational step in each iteration of Algo- 
rithm 1 entails solving (0 N times - we now show this can be 
done in O(logL) time when the reward function f t for user 
i is non-differentiable at at most L points. The composition 
of function fi with hi is a concave function as shown in 
Lemma ( 15.11 ). Hence, to compute the bandwidth allocation for 
UE i as given in equation (0, we can use a bisection on 
bi. First we obtain how many packets should be served fully 
such that the corresponding bandwidth required, bi, satisfies 
equation © in O(logL) time. Then, we compute bi. 

We compute the range of subgradients for packet 77 as 



b = h; 



> b=h: 



i fJ2l=i 



A J 1 V A J (6) 

SG(rf) = di(ri)[mmdhi(p),mmdhi(b)] 



where we recall (77) and Si (rj) are the delay and size for 77th 
packet queued at UE i. Note that the inverse of hi is simple 
when bi < P/jf, otherwise it can be computed via bisection. 

Algorithm 2: Bisection for Number of Packets 

Initialize: rj = rj = iXi, where we recall 7T; is the 

number of packets queued at UE i. 

repeat 

1. Bisect: 77 = [(77 + rj)/2\ . 

2. Compute subgradient range SG(rj) 

3. Update: If min SG(rj) > A, then r/ := r\, else if 
max SG(rj) < A, then rj := r\, else 77,77 := 77. 

until 77 = 77 

The number of packets to be served completely is r\ = r\— 1. 
Now we show how to compute the bandwidth allocation bi. 
Note that hi has at most one point of discontinuity, say bi. If 
b < bi < b for 77 = 77 — 1 in ©, then bi = b if \/di(rj) 6 
dhi(bi); else update b or b appropriately. 

Algorithm 3: Computation of RB assignment 

Given tolerance /1, b, b. 
repeat 

1. Bisect: b = (b + b)/2. 

2. Update: If /i-(6) > X/di(r]), then b := b, else if 
h'i(b) <X/di(rf), then b := b. 

until \b - b\ < 7/ 

A similar method can be used for best effort traffic and the 
analysis is omitted here due to lack of space. 
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Parameter 


Value 


Channel Profile 


1 1 U - 1 redA 


Mobile Speed 


i i /i 

5 km/nr 


Log-Normal Shadowing 


a =8.9 drim 


Intra-site Shadowing Correlation 


1.0 


Inter-site Shadowing Correlation 


0.5 


Cell Radius 


1 km 


\T. _J! T TTT -, / -.11 

No. or Uhs/cell 


20 


No. or KBs 


110 


Max UE Tx Power 


zj a Din 


No. of Tx & Rx Antenna 


i 


eNB & UE Antenna Gains 


dBi 


Thermal Noise Density 


-174 dBm/Hz 


BSR periodicity 


5 ms 


max. number of retransmission 


6 



TABLE I 
Simulation Parameters 



VI. Simulation Results 
A. Simulation Framework 

The algorithms in the previous section were simulated using 
a detailed system simulator where the MAC layer signalling 
was modeled faithfully, and the PHY layer performance 
was abstracted via modeling of fading channels, transmission 
power, and capacity computations as in [29], lfT5l . A hexagonal 
regular cell layout with three sectors per site was simulated 
with the parameters as noted in Table U For fractional power 
control parameter values (Pq — — 60 dBm, a = 0.6) similar to 
those in J8), a 19 cell (57 sector) simulation with wrap around 
was first performed to determine the interference over thermal 
(IoT) at the base-station of a cell to be 6 dB on an average. 
In subsequent simulations, only one cell was simulated with 
the IoT assumed to be constant in time and frequency. This 
drastically reduces the simulation time while still accounting 
for the inter-cell interference. 

The time varying channel gains, GVs, were assumed to 
be measured perfectly at the base-station in each subframe. 
The MCS was picked on the basis of the channel gain from 
the UE and a rate adaptation algorithm to target an average 
of two H-ARQ transmissions for successful decoding was 
used. We use the mutual information effective SINR metric 
(MIESM) [30 1; we first obtain the effective SINR according 
to the modulation alphabet size and then use that value to 
simulate an event of packet loss according to the packet 
error rate for the effective SINR. We model the timelines 
for Scheduling Request (SR), resource grants, Hybrid-ARQ, 
ACK/NACKs, and BSR as described in Sec. M We assume 
error free transmission of control messages in our simulations. 

We focus on delay QoS traffic and consider two different 
models BP . Live Video: This is an ON-OFF Markov process 
with fixed packet size is used for the live video traffic model. 
The Markov process dwells in either state for 2 seconds 
and when in the ON state, generates a packet every 20 ms. 
Streaming Video: Here, both the packet interarrival times and 
the packet sizes are independently drawn from truncated Pareto 
distributions. The number of arrivals in a frame length of 100 
millisecond is fixed at 8, while their interarrival times are 
drawn from a truncated Pareto distribution with exponent 1.2 
and truncation to [2.5 ms - 12.5 ms]. We use an exponent 



of 0.7 for the packet size distribution with varying values for 
the truncation limits, so as to control the mean data rate. For 
example, to get a mean rate of 500 kbps, we fix the limits to 
[215 bytes - 1500 bytes]. 

In order to map the optimal resource allocation computed 
using Algorithm[T]to actual RB grants we use a heuristic which 
ranks the users in decreasing order of marginal reward times 
spectral efficiency when very small amount of bandwidth is 
given to the user. The RB allocation is then done in the order 
of the rank, with each bandwidth amount (as per Algorithm Q} 
mapped to an available segment of closest size. 

B. Results 

We consider two topologies for simulation: a macro-cell 
with the path loss between the base station and UEs randomly 
selected between 100 dB and 135 dB ||29l , micro-cell with 
path loss in the range 107 dB to 115 dB. We simulate three 
scheduling algorithms: (i) Iterative Delay which maximizes 
the reward function in Sec. IIII-BI (ii) Iterative Queue which 
minimizes sum-of-squares of queue lengths as in [26 1 and 
similar to 11121 . (iii) non-iterative maximum weight where a 
UE with the highest queue length times spectral efficiency for 
first RB is allocated bandwidth until the queue is drained or 
the UE becomes power limited before allocation to the next 
UE. We note that the computational algorithms in this paper 
are applicable to computing resource allocation for scheduling 
policies (i) and (ii), and that policies similar to (iii) do not 
consider the change in reward function of the UE in a given 
subframe. 

1 ) Macro cell Topology: We consider 20 UEs with a mix 
of live video and streaming video traffic. Since live video has 
a tighter requirement for packet delays, we bias the scheduler 
to assign live video users 5x priority compared to streaming 
video users for same packet delay. Simulations were performed 
for low load and high load cases: 

(1) High Load: 5 UEs have live video traffic, each with a mean 
rate of 300 kbps. For the other 15 UEs with streaming video 
traffic, we mimic an adaptive-rate streaming mechanism in 
which the data rate for each user depends on the quality of its 
channel to the base-station, i.e. a user close to the base-station 
transmits a better quality video compared to a cell-edge user. 
For simulating high-load, the truncation parameters mentioned 
in Section [VI- Al are varied for each UE such that they generate 
traffic at 80% of the average data rate they received with full 
buffer traffic. 

(2) Low Load: 5 UEs have live video traffic with a mean rate 
of 200 kbps. The UEs with streaming video traffic are now 
set to operate at 40% of their full buffer average data rate. 

We first study the performance of the delay estimation 
mechanism described in Section |IV] Figure |4] shows the 
estimated head of line (HoL) delay and the actual HoL delay 
at a UE over a period of 1 second. The estimated values can 
be seen to follow the actual delays but the accuracy is limited 
by the granularity of BSR messages, i.e., if there are multiple 
arriving packets between two successive BSR messages, the 
packets are bundled as one in our mechanism resulting in 
relatively small errors in HoL estimation. 
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Fig. 4. HoL delay estimation performance 
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Fig. 5. Live video users: delay performance 



Next we show the performance of the head of line delay 
based scheduling scheme computed as the solution to the 
optimization problem in © with the reward function in (fj). 
Figure [5] shows the median and 95th percentile delays of the 
live video UEs for the two baseline and the head of line 
delay based schedulers for low and high loads. The delays 
experienced by the live video users are consistently less in the 
case of HoL delay based scheduling with the non-iterative 
scheme resulting in an average 95th percentile delay 1.6x 
higher than with the HoL delay scheduling. The queue based 
scheme also results in slightly higher delays, on an average 
l.lx compared to 95th percentile delays for HoL scheduling. A 
more pronounced improvement is observed for the streaming 
video users, as shown in the delay plots in Figure [6] In this 
case, the non-iterative and queue based schemes result in 6.2x 
and 5x more delays compared to HoL delay scheduling in 
terms of 95th percentile latencies. Finally, Figure [7] shows the 
combined delay numbers for uplink packets from all the UEs 
in the high load simulation. As can be seen from the figure, the 
iterative queue based and delay based schemes result in similar 
delays for live video users due to preferential assignment. 
However this results in large delays for the streaming video 
users for both non-iterative and queue based schemes: close 
to llx and 8x respectively compared to HoL delay based 
scheduling in terms of 95th percentile delays. Thus, leveraging 
the approximate packet delays obtained via our method leads 
to significant performance improvement over queue based 
scheduling. Moreover, even for the queue based scheduler, the 
computational methods in this paper are very useful. 

2) Micro cell Topology: In order to compare these 
scheduling schemes in a smaller cell topology, we ran a 
second simulation with 20 UEs located within a region with 
path loss 107-115 dB from the base station. Each UE, in 
this simulation, carries streaming video traffic with the mean 
data rate randomly selected between 300-2000 Kbits/sec. 
Decoupling the mean traffic rate with the path loss highlights 
the relative performance of the scheduling algorithms in real 
deployments where prior knowledge of user demand is rarely 
known. Individual and cell wide delay numbers are shown in 
Figure [S] which shows that 95th percentile delays for non- 
iterative and queue based schemes are 1.8x and 1.4x more 
than those for the HoL delay based scheduling. 



10 



-3 10 



■ Non-Iterative Scheduling 
- Queue Based Scheduling 
HoL Delay Based Scheduling 



95 %ile 
Delays 



95 %ile 
Delays " 



Median 
Delays 



£ 10 



10 



Median 
Delays 



Low Load Simulation 



High Load Simulation 



Fig. 6. Streaming video users: delay performance 
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Fig. 7. Cell-wide delay performance of all packets in macro cell simulation 
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Fig. 8. Individual and Cell-wide Delay performance for micro cell simulation 

VII. Frequency Selective Resource Allocation 

We extend the analysis in [10] for frequency selective 
fading to concave functions fi (such as the delay based 
reward function) which are thrice continuously differentiable 
everywhere except at L points where they are only continuous. 
We can re-write such a function as 

L 

h{n) = E f il ( min ( pi ~ P 1 - 1 ' [ r * ~ ^-!]+)) 
i=i 

where < p% < ... < pi are the points of non- 
differentiability and fn : R + i->- K are thrice continuously 
differentiable concave functions defined as 



The first constraint implies that the total rate for a user is 
the sum of rates over sub-bands, the second constraint is on 
total bandwidth allocation in a sub-band, third constraint is on 
peak power at the UE in a subframe, and the fourth constraint 
models fractional power control. The following lemma follows 
easily from the construction of the fus: 

Lemma 7.1: If {r%,bh) is a solution to the optimization 

problem (£7), then Ylifi {j2j r tj S j i s me maximum sum re- 
ward for any feasible resource allocation. 

General purpose interior points methods to solve the above 
optimization problem have a complexity of 0(NM + NL) 3 
per iteration - we exploit the structure to reduce it to 
0{N(L 2 + M 2 )). Note that in practice L and M are much 
smaller than N. In order to construct a solution for which the 
bandwidth allocation is contiguous in frequency to satisfy the 
SC-FDMA requirements, we can use the heuristic in 1 10]. The 
main computation to solve (0 is to determine the Newton step 
at each iteration which entails solving a set of linear equations 
of the form (we omit the details due to lack of space, the exact 
expressions can be obtained following the steps in l32l ): 



where 

\-M 



" Hi 




A T - 




H N 






A 





Hi 6 


K (L+M)x(L4 


- M \ A 


a e 





Xl 

XN 
I) 



e m m xn{l+m)^ x . g 
. We first eliminate the 



X{ s as 



h; 



A jL+M)(i~l) + l:(L+M)(i)V 



fil{x) = fi{pi-\ + 

and satisfy 



x ) - Mpi-i), x e [Q,pi - pt 



I > l,x e [0,pi - pi-i] 



where AT.. 



is the submatrix of A T given by rows k to to. 
> l,Po ^Wh invert Hr 1 in 0(L 2 + M 2 ) time, solve for y in 0(M 3 ) 
time (M linear equations in M variables), and back-substitute 
y to obtain x. To invert Hi, we note that it decomposes as 

y e [0,pi-i - pis]- 



We also assume xip^ 1 (y/x) is concave for all (x,y) > 0; this 
is true for example, when ip is the Shannon capacity formula, 
and for practical M-QAM schemes. 

Consider the following convex optimization problem over H 
r^j's, rjj's (rate for user i on sub-band j), and fr^'s (bandwidth 
for user i on sub-band j): 

N L 

max. ^2^2fa(ru), 

i=i i=i 

L M 
1=1 3 = 1 

i=l 
M 

E 

3=1 



Ki 



K M 



9i 



T 

9i9i 



s.t. 



Vi, fa < pi- pi__ x , Vi,/ 



" hihj " 




' 








+ 





T 


R L+M , hi 


e i 




Ci e M 



where gi € 

inversion lemma we can invert Hi 



VIII. 



9l 



M . Using the matrix 
in 0(L 2 + M 2 ) time. 



G 



^ (nj/hj -i)<P, Vi, 



ij 



G 



Nn 



i,bij > 0, 



Conclusions 

We designed a general computational framework in this 
paper to enable a wide array of online scheduling policies in a 
computationally efficient manner. We modeled the constraints 
due to fractional power control, and formulated an optimiza- 
tion problem with non-differentiable objective function. We 
showed how to estimate the packet delays on the uplink via 
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the BSR reports, and proposed a novel scheduling policy based 
on packet delays. Numerical results demonstrated that using 
packet delay estimates for the uplink can lead to significant 
reduction in packet delays as compared with a queue length 
based scheduler. There are many interesting directions for 
future work. For example, we can further study the connections 
with the work in IfTTI . iTTZl . In terms of implementation, an 
interesting question is whether we can design approximation 
algorithms for the uplink bandwidth packing problem which 
are optimal according to some metric. 
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